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Abstract — Two familiar notions of correlation are re- 
discovered as extreme operating points for simulating a 
discrete memoryless channel, in which a channel output 
is generated based only on a description of the channel 
input. Wyner's "common information" coincides with the 
minimum description rate needed. However, when common 
randomness independent of the input is available, the 
necessary description rate reduces to Shannon's mutual 
information. This work characterizes the optimal tradeoff 
between the amount of common randomness used and the 
required rate of description. 

I. Introduction 

What is the intrinsic connection between correlated 
random variables? How much interaction is necessary to 
create correlation? 

Many fruitful efforts have been made to quantify 
correlation between two random variables. Each quantity 
is justified by the operational questions that it answers. 
Covariance dictates the mean squared error in linear esti- 
mation. Shannon's mutual information is the descriptive 
savings from side information in lossless source coding 
and the additional growth rate of wealth due to side 
information in investing. Gacs and Korner's common 
information [1] is the number of common random bits 
that can be extracted from correlated random variables. 
It is less than mutual information. Wyner's common 
information [2] is the number of common random bits 
needed to generate correlated random variables and is 
greater than mutual information. 

This work provides a fresh look at two of these quan- 
tities — mutual information and Wyner's common in- 
formation (herein simply "common information"). Both 
are extreme points of the channel simulation problem, 
introduced as follows: An observer (encoder) of an i.i.d. 
source Xi,X2, ■■■ describes the sequence to a distant ran- 
dom number generator (decoder) that produces Yi, I2, ■•• 
(see Figure [Hi. What is the minimum rate of description 
needed to achieve a joint distribution that is statistically 
indistinguishable (as measured by total variation) from 
the distribution induced by putting the source through a 
memoryless channel? 

Channel simulation is a form of random number 
generation. The variables X" come from an external 
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Fig. 1. A discrete-memoryless cliannel is simulated by two separate 
processors, F and G. The first processor, F, observes X and tlie 
second processor, G, generates Y after receiving a message at rate R 
from _F. The minimum rate needed is the common entropy of X and 
Y. 



source and are generated to be correlated with 
X". The channel simulation is successful if the total 
variation between the resulting distribution of (X", ¥"■) 
and the i.i.d. distribution that would result from passing 
X" through a memoryless channel is small. This is 
a strong requirement. It's stricter than the requirement 
that be jointly typical as in the coordinated 

action work of Cover and Permuter [3]. This total 
variation requirement means that any hypothesis test that 
a statistician comes up with to determine whether X" 
was passed through a real memoryless channel or the 
channel simulator will be virtually useless. 

Wyner's result implies that in order to generate X" 
and Y" separately as an i.i.d. source pair they must 
share bits at a rate of at least the common informa- 
tion C{X;Y) of the joint distribution. In the channel 
simulation problem these shared bits come in the form 
of the description of However, the "reverse Shan- 
non theorem" of Bennett and Shor [4] suggests that a 
description rate of the mutual information I{X;Y) of 
the joint distribution is all that is needed to successfully 
simulate a channel. How can we resolve this apparent 
contradiction? 

The work of Bennett and Shor assumes that common 
random bits, or common randomness, independent of the 
source X" are available to the encoder and decoder. 
In that setting, the common randomness provides a 
second connection between the source X" and output 

'To achieve channel simulation with a rate as low as the common 
information one must change Wyner's relative entropy requirement in 
[2] to a total variation requirement as used in this work. 
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Fig. 2. A discrete-memoryless cliannel is simulated by two separate 
processors, F and G. The first processor, F, observes X and common 
randomness independent of X at rate /?2- The second processor, G, 
generates Y based on the common randomness and a message at rate 
Rl from F. 



Y", in addition to the description of X". Remarkably, 
even though it is independent from the source X", 
the common randomness assists in generating correlated 
random numbers and allows for description rates smaller 
than the common information C{X; Y). 

In this work, we characterize the tradeoff between the 
rate of available common randomness and the required 
description rate for simulating a discrete memoryless 
channel for a fixed input distribution, as in Figure |2] 
Indeed, the tradeoff region of Section |III] confirms the 
two extreme cases. If the encoder and decoder are 
provided with enough common randomness, sending 
I(X:Y) bits per symbol suffices. On the other hand, 
in the absence of common randomness one must spend 
C(X;Y) bits per symbol. 

This result has implications in cooperative game the- 
ory, reminiscent of the framework investigated in [5]. 
Suppose a team shares the same payoff in a repeated 
game setting. An opponent tries to anticipate and exploit 
patterns in the team's combined actions, but a secure line 
of communication is available to help them coordinate. 
Of course, each player could communicate his random- 
ized actions to the other players, but this is an excessive 
use of communication. A memoryless channel is a useful 
way to coordinate their random actions. Thus, common 
information is found in Section IVIII to be the significant 
quantity in this situation. 

II. Preliminaries and Problem Definition 
A. Notation 

We represent random variables as capital letters, X, 
and their alphabets are written in script, X. Sequences, 
Xi, ...jXri are indicated with a superscript X''\ Distri- 
bution functions, px{x), are usually abbreviated as p{x) 
when there is no confusion. 

Accented variables, X, indicate different variables for 
each accent, but their alphabets are all the same, X. 
Similarly, distribution functions written with an accent 
or different letter, such as p{x) versus p{x), represent 
different distributions. 

Markov chains, satisfying p(x,y,z) — p{x,y)p{z\y), 
are represented with dashes, X — Y — Z. 



(Wyner's) common information 

C{X-Y) " 

Conditional common information: 



min I(X,Y:U). 

X-U-Y 



C(X;Y\W)= min I(X,Y;U\W). 

X-(U,W)-Y 

Total variation distance: 

X 

B. Problem Specific Definitions 

A source X" is distributed i.i.d. according to p{x). 
A description of the source at rate i?i is represented 
by / G {1, 2"^^}. A random variable J, uniformly 
distributed on {!,..., 2"^^} and independent of X", 
represents the common random bits at rate i?2 known 
at both the encoder and decoder. The decoder generates 
a channel output y" based only on / and J. 

The channel being simulated has a the conditional 
distribution q(ii\x), thus the desired joint distribution is 
p{x)q{y\x). 

Definition 1: A (2"^^ , 2"^^, n) channel simulation 
code consists of a randomized encoding function, 

Fn : A-" X {1, 2, 2"^-} ^ {1, 2, 2"^^, 
and a randomized decoding function, 

G„ : {1, 2, 2"^!} X {1, 2, 2"^^} ^ . 

The description / equals i^„(X", J), and the channel 
output F" equals G'„(/, J). 

Since randomized functions are specified by condi- 
tional probability distributions, it is equivalent to say 
that a (2"^i, 2"^^, n) channel simulation code consists 
of a conditional probability mass function i/"|a;", j) 
with the properties that I i, j, x") \1\ = 

2"^'\ and Ul = 2"^\ 

The induced joint distribution of a (2"^^ , 2"^^ , n) 
channel simulation code is the joint distribution on 
the quadruple J). In other words, it is the 

probability mass function, 

= p(*,y"k",jM2;",j), (1) 

where p{x",j) = p{j)Y[k=iP{^k) by construction. 

Definition 2: A sequence of (2"^i , 2"^^ , n) channel 
simulation codes for n — 1,2,... is said to achieve 
q{y\x) if the induced joint distributions have marginal 
distributions p(a;",y") that satisfy 



lim 



p(a;",?/") -W_p{xk)q{yk\xk] 



k=l 



0. 



Definition 3: A rate pair R2) is said to be achiev- 
able if there exists a sequence of (2"^^ , 2"^^ , n) channel 
simulation codes that achieves q{y\x). 

Definition 4: The simulation rate region is the closure 
of achievable rate pairs i?2)- 



III. Main Result 

Theorem 3.1: For an i.i.d. source with distribution 
p{x) and a desired memoryless channel with conditional 
distribution q{y\x), the simulation rate region is the set, 
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S = {{Ri,R2) en^ 



Ri 



3p{x, y,u) e D s.t. 
Ri > I{X-U), 
R2 > I{X,Y;U)}, 



(2) 



where 

D - {p{x,y,u) 



{X,Y)^p{x)q{y\x), 

X — U — Y form a Markov chain, 

< l-^"! 1^1 + 1}. (3) 

IV. Observations and Examples 

Two extreme points of the simulation rate region S 
fall directly from its definition. If R2 = 0, the second 
inequality in (|2]) dominates. Thus, the minimum rate 
Ri is the common information C{X\ Y). This coincides 
with the intuition provided by Wyner's result in [2]. At 
the other extreme, using the data processing inequality 
on the first inequality of Q yields i?i > I{X;Y) 
no matter how much common randomness is available, 
and this is achieved when R2 > H{Y\X)^ Source 
coding results and the coordinated action work of Cover 
and Permuter in [3] illustrate that with a description 
rate of I{X-,Y) we can create a codebook of output 
sequences in such a way that we'll likely be able to find 
a jointly typical output sequence for each input sequence 
from the source. Consequently, we can then randomize 
the codebook using common randomness to actually 
simulate the channel, as Bennett and Shor proved in [4]. 

A. Binary Erasure Channel 

For a Bernoulli-half source X, let us demonstrate the 
simulation rate region for the binary erasure channel. 
Y is an erasure with probability Pg and is equal to 
X otherwise. The distributions in D that produce the 
boundary of the simulation rate region are formed by 
cascading two binary erasure channels as shown in 
Figure [3] where 
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The mutual information terms in (|2|l become 

IiX;U) = 
I{X,Y;U) = h{P,) + il-p,)il~hip2)), 

where h is the binary entropy function. 

^i?2 doesn't necessary have to be as large as H{Y\X) 
{I{X; Y), R2) to be in the simulation rate region. 
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Fig. 3. The Markov chains X — U — Y that give the boundary 
of the simulation rate region for the binary erasure channel with a 
Bernoulli-half input ai'e formed by cascading two erasure channels. 



BEC Simulation Rate Region, Pe =0.75 




Fig. 4. Boundary of the simulation rate region for a binary erasure 
channel with erasure probability Pe = 0.75 and a Bernoulli-half input, 
where Ri is the description rate and R2 is the rate of common random- 
ness. Without common randomness, a description rate of C{X; Y) is 
required to simulate the channel. With unlimited common randomness, 
a description rate of I{X-,Y) suffices. 



Figure |4] shows the boundary of the simulation rate 
region for erasure probability Pg ~ 0.75. The required 
description rate Ri varies from C{X;Y) — h{0.75) = 
0.811 bits to I{X; Y) = 0.25 bits as the rate of common 
randomness runs between and H{Y\X) = /i(0.75) = 
0.811 bits. 

V. Sketch of Converse 
Let (i?i,i?2) be an achievable rate pair Then for 



chan- 



each e G (0,1/4) there exists a (2"^^i,2 
nel simulation code with an induced joint distribution 

p{x" ,y^ ,i, j) such that 



- '[[p{xk)q{yk\xk) 



k=l 



< e. 



Let the random variable K be uniformly distributed over 
the set {1, n}. The variable K will serve as a random 
time index. 

A. Entropy Bounds 

The joint distribution of the sequences is 
close in total variation to an i.i.d. distribution, so we can 



extend Lemma 2.7 of [6] to obtain two bounds: 



H{X^-,Y")-J2H{Xk,Yk) 



k=l 



< ngie), (4) 



I{Xk,Yk;K) < ng{e), (5) 



where 



5(e) ^4e( log log 13^1+ log - 



(6) 



Notice that lim£|o5(e) = 0. 
B. Epsilon Rate Region 

Define an epsilon rate region, 

5, = {(i?i,i?2) e 7^2 : 3p{x,y,u)CzD^s.X. 

Ri > I{X-U)-2g{e), 

R1+R2 > I{X,Y;U)~2g{e)}, 

where 



De = {p{x,y,u) 



Lemma 5.1: 



\\p{x,y) ~~ p{x)q{y\x)\\i < e, 
X — U — Y form a Markov chain, 
< 1^113^1 + 1}. (7) 



Proof: We use familiar information theoretic in- 
equalities, and the fact that X" and J are independent, 
to bound i?i and the sum rate Ri + i?2. 

nRi > H{I) 

> H{I\J) 

> i(x^;i\J) 

= I{T';I,J). (8) 

n(i?i+i?2) > H{I,J) 

> I{X",Y";I,J). (9) 

We then lower bound the rh.s. of (HJ and ^ using 
similar steps. Here we proceed from (|9]l. 

/(X";r";/,J) = i7(X",y") J) 

n 

> H{X^,Y")-Y,H{Xk,Yk\I,J) 

k=l 



Xk — {I, J, K) — Yk to complete the proof of the 
lemma. (The cardinality bound of J7 in (|7]l is shown to 
be satisfiable via a generalized Caratheodory theorem.) 

■ 

C. Lower semi- continuity 

The epsilon rate regions decrease to the simulation 
rate region as epsilon decreases to zero. 
Lemma 5.2: 

Pi S.dS. 

ee(0,l/2) 

VI. Sketch of Achievability 
A. Resolvability 

One key tool for the achievability proof is summarized 
in Lemma l6n This lemma is implied by the resolvability 
work of Han and Verdu in [7], but the concept was first 
introduced by Wyner in Theorem 6.3 of [2]. 

Lemma 6.1: For any discrete distribution p{u,v) and 
each n, let C^"^ = {C^"(?Ti)}m=i be a "codebook" 
of sequences each independently drawn according to 

Ilk=iPu{uk)- 

For a fixed codebook, define the distribution 

QK) = 2-"^' ^ l[pv\u{vk\Uk{ni)). 

m=l k=l 

Then if i? > I{V] U), 



lim E 



Qiv^')-Y[pv{ 



Vk, 



k=l 



0, 



where the expectation is with respect to the randomly 
constructed codebooks C^"-*. 

B. Existence of Achievable Codes 

Assume that (i?,i,i?2) is in the interior of S. Then 
there exists a distribution p*(a::, y,u) E D such that Ri > 
I{X- U) and Ri + R2 > I{X, Y; U). 

For each n, let (/, J) be uniformly distributed on 
{!,... ,277i?i} X {!,... ,2"^^}. We apply Lemma O 
twice, once with V = (X, Y) and again with V — X, 
to assert that there exists a sequence of "codebooks" 
C(") = {t/"(^,J)}(„)eIxJ, n = 1,2,... with the 
properties 



lim 

n — >oo 



k=l 

= nIiXK,YK;I,J\K)-ng{e) 
> nI{XK,YK;I,J,K)~2ng{e). 

The second inequality comes from (01, and the last 
inequality comes from Q. 

The joint distribution of the pair (Xk,Yk) can be 
shown to satisfy the total variation constraint in (|7]). 
Finally, we acknowledge the Markovity of the triple 



lim 



k=l 

n 



fe=l 



0,(10) 



= 0,(11) 



where y") and Q{x",j) are marginal distributions 

derived from the joint distribution 



Qix\y\z,j) 



Pihj) Y[p*xx\ui^k,yk\Uk{i,j))- 



k=l 



In an indirect way, we've constructed a sequence of 
joint distributions Q{x"',y",i,j) from which we can 
derive channel simulation codes that achieve q{y\x). 
The Markovity of p* implies the Markov property 

Qix",y"\i,j) - Q{x"\t,j)Q{y"\i,j). Let 



that \U\ < 2"-" and 



p(y"|j,j) 



Q{i\x",j), 

Q{y''\hj)- 



Considering (fTol i and (fTTT i with the properties of to- 
tal variation and p* in mind, it can be shown that 

p{i,y^\x'^ , j) = p{i\x",j)p{y'^\i,j) is a sequence of 
channel simulation codes that achieves q{y\x). 

C. Comment on Achievability Scheme 

This channel simulation scheme requires randomiza- 
tion at both the encoder and decoder. In essence, a 
codebook of independently drawn [/" sequences is over- 
populated so that the encoder can choose one randomly 
from many that are jointly typical with X". The decoder 
then randomly generates conditioned on 

VII. Game Theory 

Our framework finds motivation in a game theoretic 
setting. Consider a zero-sum repeated game between 
two teams. Team A consists of two players who on the 
ith iteration take actions Xi ^ X and Yi e y. The 
opponents on team B take combined action Zi G Z. 
All action spaces <Y, y, and Z are finite. The payoff for 
team A at each iteration is a time-invariant finite function 
Il{Xi,Yi, Zi) and is the loss for team B. Each team 
wishes to maximize its time-averaged expected payoff. 

Assume that team A plays conservatively, attempting 
to maximize the expected payoff for the worst-case 
actions of team B. Then the payoff at the ith iteration is 

e, ^ minE[n(x„y„z)|x'-\r'-i] . (U) 

Clearly, ( fT2] i could be maximized by finding an 
optimal mixed strategy p*{x,y) that maximizes 
minzez E [n(X, Y, z)] and choosing independent 
actions each iteration. This would correspond to the 
minimax strategy. However, now we introduce a new 
constraint: The players on team A have a limited secure 
channel of communication. Player 1, who chooses the 
actions X''\ communicates at rate R to Player 2, who 
chooses Y". 

Let U be the message passed from Player 1 to Player 
2. We say a rate R is achievable for payoff 9 if there 
exists a sequence of random variable triples (X", F", U) 
that each form Markov chains X" - L/ - y" and such 

^This Markov chain requirement can be relaxed to the more physi- 
cally relevant requirement that Xj. — {U, X''~^ ,Y''~^) — Yf; for all 



lim I 

n — 'oo 



1 " 
n ^ — ^ 



> e. 



(13) 



Let i?(8) be the infimum of achievable rates for 
payoff Q. We claim that R{Q) is the least average 
common information of all combinations of strategies 
that achieve average payoff 9. Define, 



Ro{e) = min C{X;Y\W) 
s.t. E 



mmE[n{X,Y,z)\W] 



> 8. 



Theorem 7.1: 



i?(9) = i?o(9). 

Converse Sketch: 
The important elements of the converse are the inequal- 
ities 

n(i?(9) + e) > H{U) 

> I{X'\Y'';U) 



Y,I{X^,Yf,U\X^-\Y'^-'] 



= nI{XK,YK;U\X''-\Y''-\K), 

for all e > 0, where K is uniformly distributed on 
{1, n}. Now identify the tuple {X^-^ ,Y^-^ , K) as 
the auxiliary random variable W. 

Achievability Comment: 
The random variable W serves as a time sharing variable 
to combine strategies of high and low correlation. 
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